Empirical distributions of time intervals between COVID-19 cases and more severe outcomes in Scotland

A critical factor in infectious disease control is the risk of an outbreak overwhelming local healthcare capacity. The overall demand on healthcare services will depend on disease severity, but the precise timing and size of peak demand also depends on the time interval (or clinical time delay) between initial infection, and development of severe disease. A broader distribution of intervals may draw that demand out over a longer period, but have a lower peak demand. These interval distributions are therefore important in modelling trajectories of e.g. hospital admissions, given a trajectory of incidence. Conversely, as testing rates decline, an incidence trajectory may need to be inferred through the delayed, but relatively unbiased signal of hospital admissions. Healthcare demand has been extensively modelled during the COVID-19 pandemic, where localised waves of infection have imposed severe stresses on healthcare services. While the initial acute threat posed by this disease has since subsided with immunity buildup from vaccination and prior infection, prevalence remains high and waning immunity may lead to substantial pressures for years to come. In this work, then, we present a set of interval distributions, for COVID-19 cases and subsequent severe outcomes; hospital admission, ICU admission, and death. These may be used to model more realistic scenarios of hospital admissions and occupancy, given a trajectory of infections or cases. We present a method for obtaining empirical distributions using COVID-19 outcomes data from Scotland between September 2020 and January 2022 (N = 31724 hospital admissions, N = 3514 ICU admissions, N = 8306 mortalities). We present separate distributions for individual age, sex, and deprivation of residing community. While the risk of severe disease following COVID-19 infection is substantially higher for the elderly and those residing in areas of high deprivation, the length of stay shows no strong dependence, suggesting that severe outcomes are equally severe across risk groups. As Scotland and other countries move into a phase where testing is no longer abundant, these intervals may be of use for retrospective modelling of patterns of infection, given data on severe outcomes.


A Associating outcomes
We define an interval ∆t AB ≥ 0 as the time difference between two different COVID-19 outcomes A and B, given as a whole number of days. We do not differentiate by any intermediate outcomes; for example, the case-to-mortality intervals includes both patients that were and were not admitted to an ICU.
We link different events with one another. For example, consider a hospitalisation entry H, for which we are attempting to associate a case C. To do this we: 1. Search for cases {C} from the eDRIS test data, where the DZ, age range and sex matches with H, and occurred on the same day as, or up to 28 days before H.
2. If at least one one possible matching case is found, take the interval ∆t CH as the time difference between H and and the median date of the candidate cases {C}. Otherwise, label the hospitalisation entry H as unlinked.
We associate outcomes up to 21 days apart, with the exception of case-to-mortality intervals, where we search over 28 days. For case intervals, unlinked instances are reports of more severe outcomes, but without an associated prior case reported. For nosocomial intervals, unlinked outcomes are common (such as a mortality without an ICU admission), and are not counted.
Finally, in the data we omit events with incomplete age/sex/DZ entries (as we use these to associate different outcomes), as well as repeat admissions by the same individual within a window of 60 days, taking only the first admission.

B Distribution fits
For fitting the empirical distributions derived from the eDRIS data, we choose two-parameter gamma distributions P (∆t): for 0 ≤ ∆t ≤ 21 days (28 days for ∆t CM ) and zero outside this range, with α determining the characteristic shape of the distribution for smaller ∆t, and β determining the rate of exponential decay as ∆t increases, and λ a normalising constant fixing ∞ 0 d(∆t )P (∆t ) = 1. A gamma distribution allows us to flexibly fit distributions where the modal value is either at or away from zero days. We perform the fits using the fitdist function (version 1.1-8) in R (version 4.1.3).
To account for instances where the case and a more severe outcome are on the same day (∆t = 0), case intervals are fit across ∆t ≥ 1 only, with a zero-inflation ν fit separately, to reflect the proportion of all "same-day" events: where λ here fixes ∞ 1 d(∆t )P (∆t ) = 1 − ν.

C Estimation of hospitalisation-to-discharge intervals
In this section we detail the method to estimate a distribution for the interval between hospitalisation and discharge, for patients presumed to not die in hospital. This is a much broader estimate across the whole population, as we do not have explicit times between admission and discharge.
We instead rely on public, national-level occupancy data, provided by PHS.
For those admitted with COVID-19 that go on to die, we first use the eDRIS data (and associated intervals between hospital admission and death) to derive a partial occupancy timeseries. The difference between this occupancy and the overall PHS occupancy is then taken as the occupancy of admitted individuals that are discharged. Finally, knowing the admission dates of patients that go on to survive (i.e., do not have an associated death) from the eDRIS data, we estimate the hospitalisation-to-discharge interval distribution, and thus how much surviving individuals on average contribute to the hospitalisation occupancy burden.
Formally, the trajectory of hospital admissions A(t) includes those that go on to be discharged (and we assume recover) A D (t), and those that die in hospital A M (t): Similarly, the trajectory of COVID-19 hospital occupancy O(t) includes the occupancy of those that go on to recover and be discharged O D (t) and those that go on to die O M (t): To estimate the discrete distribution P (∆t HD ) for the (H)ospital admission-to-(D)ischarge interval, we first rewrite the occupancy of those eventually discharged We assume P (∆t HD ) follows a zero-inflated exponential distribution, with the zero inflation accounting for individuals admitted, but discharged without an overnight stay. We use a standard Approximate Bayesian Computation (ABC) algorithm in a two-parameter space (exponential decay rate β, and zero-inflation ν), and take two different fits for the periods September 10 2020 -April 30 2021, and May 1 2021 -January 6 2022.
The prior (for ν: U (0, 1) and for β: U (0.05, 1)), allows for any zero-inflation, and restricts the mean occupancy of those that stay at least one night between 1.1 and 20 days.
We build a posterior by accepting parameters that generate modelled occupancy trajectories O D (t) that best fit O D (t). We sample 10 6 different pairs of parameters, from which we take the 1, 000 that produce via Eq. (5) the timeseriesÕ * D (t) that best fits O D (t), minimising the sum of Posterior distributions for β and ν, the mean hospitalisation-to-discharge interval and of the overall hospitalisation-to-discharge interval distribution. are presented in Fig A.  , mean intervals (centre) and distributions for hospitalisation-to-discharge intervals ∆t HD , across the "pre-Delta" (10 September 2020 -30 April 2021) period, and "post-Delta" period (1 May 2021 -6 January 2022) (right).